A Measure of Disclosure Risk for Microdata
نویسنده
چکیده
Protection against disclosure is important for statistical agencies releasing microdata files from sample surveys. Estimates of simple measures of disclosure risk can provide useful evidence to support decisions about release. We propose a new measure of disclosure risk: the probability that a unique match between a microdata record and a population unit is correct. We argue that this measure has at least two advantages. First, we suggest that it may be a more realistic measure of risk than two measures currently used with census data. Second, we show that it may be estimated consistently from sample data without making strong modelling assumptions. This is a surprising finding, in its contrast to the properties of the two ‘similar’ established measures. As a result, this measure has potentially useful applications to sample surveys. Moreover, we propose a simple variance estimator and show that it is consistent. We also show that the measure and its estimation may be extended to allow for misclassification of identifying variables and to allow for certain complex sampling schemes. We present a numerical study based upon 1991 census data for some 450,000 enumerated individuals in one area of Great Britain. We show that the theoretical results on the properties of the point estimator of the measure of risk and its variance estimator hold to a good approximation for these data.
منابع مشابه
Disclosure Risk Measures for Microdata
In this paper, we define several disclosure risk measures for microdata. We will analyze disclosure risk based on the disclosure control techniques applied to initial microdata. Disclosure Control is the discipline concerned with the modification of data containing confidential information about individual entities, such as persons, households, businesses, etc. in order to prevent third parties...
متن کاملAutomatic Generation of Masked Microdata
Disclosure Control is the discipline concerned with the modification of data containing confidential information about individual entities, such as persons, households, businesses, etc. in order to prevent third parties working with these data from recognizing entities in the data and thereby disclosing information about these entities. In very broad terms, disclosure risk is the risk that a gi...
متن کاملA CRONYM : Data without Boundaries D
Disclosure limitation methods for protecting the confidentiality ofrespondents in survey microdata often use perturbative techniques whichintroduce measurement error into the categorical identifying variables. Inaddition, the data itself will often have measurement errors commonly arisingfrom survey processes. There is a need for valid and practical ways to assess theprotect...
متن کاملGlobal Disclosure Risk Measures and k-Anonymity Property for Microdata
In today’s world, governmental, public, and private institutions systematically release data which describes individual entities (commonly referred as microdata). Those institutions are increasingly concerned with possible misuses of the data that might lead to disclosure of confidential information. Moreover, confidentiality regulation requires that privacy of individuals represented in the re...
متن کاملAssessing Microdata Disclosure Risk Using the Poisson-inverse Gaussian Distribution
An important measure of identification risk associated with the release of microdata or large complex tables is the number or proportion of population units that can be uniquely identified by some set of characterizing attributes which partition the population into subpopulations or cells. Various methods for estimating this quantity based on sample data have been proposed in the literature by ...
متن کامل